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ABSTRACT 

Based on the Beylkin-Cramer summation rule, we introduce a new fast al¬ 
gorithm that enable us to explore the high order statistics efficiently in large 
data sets. Central to this technique is to make decomposition both of helds and 
operators within the framework of multi-resolution analysis (MRA), and realize 
theirs discrete representations. Accordingly, a homogenous point process could 
be equivalently described by a operation of a Toeplitz matrix on a vector, which 
is accomplished by making use of fast Fourier transformation. The algorithm 
could be applied widely in the cosmic statistics to tackle large data sets. Espe¬ 
cially, we demonstrate this novel technique using the spherical, cubic and cylinder 
counts in cells respectively. The numerical test shows that the algorithm pro¬ 
duces an excellent agreement with the expected results. Moreover, the algorithm 
introduces naturally a sharp-hlter, which is capable of suppressing shot noise in 
weak signals. In the numerical procedures, the algorithm is somewhat similar to 
particle-mesh (PM) methods in N-body simulations. As scaled with 0{N\ogN), 
it is signihcantly faster than the current particle-based methods, and its com¬ 
putational cost does not relies on shape or size of sampling cells. In addition, 
based on this technique, we propose further a simple fast scheme to compute the 
second statistics for cosmic density helds and justify it using simulation samples. 
Hopefully, the technique developed here allows us to make a comprehensive study 
of non-Guassianity of the cosmic helds in high precision cosmology. A specihc 
implementation of the algorithm is publicly available upon request to the author. 
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1. Introduction 


The data explosion in observational cosmology has been seen in past decade. Though 
the rapid increasing both in quantity and quality allow us to make a great improvement in 
measuring the global properties of the universe and galaxy clustering, it poses new challenges 
to extracting information from enormous data sets. For instance, the early galaxy redshift 
survey published the catalog only of a few thousand galaxies( QDOT; Lawrence et ah 1996, 
Berkeley 1.2Jy; Fisher et al. 1995). So far, the number has been raised to around 250,000 
in the Two Degree Field Galaxy Redshift Survey (2dFGRS) and 10® in the Sloan digital sky 
survey SDSS). The upcoming galaxy redshift survey in a few years will reach to 10^ by the 
Large Sky Area Multi-object Fibre Spectroscope Telescope (LAMOST). Moreover, the state 
of arts for cosmological numerical simulations are capable of producing galaxy clustering 
models with 2160^ ~ 1.0078 x 10^® GDM particles in the millennium simulation (Springer, 
et ah, 2005), which have made us in position to address some interesting astrophysical 
problems, however, amenable to currently available numerical technique, it is not always 
possible to make all of desirable statistical measurements in those considerably large data. 

The two-point correlation function and its Fourier counterpart, the power spectrum are 
basic statistic measure to quantify the large scale structure of the universe. The standard 
optimal estimator of the correlation function (Landy & Szalay 1993, Szapudi & Szalay, 
1998) requires 0{N‘^) operations through counting pairs of particles cycling over the whole 
sample, where N is the number of objects. For a large N, the GPU time in the pair¬ 
counting is a paramount consideration in practice. In limited case, e.g. while measuring 
the correlation function on small scales or reasonable large bins, the pair-counting could be 
sped up to 0{NlogN) by the double-tree algorithm (Moore et al. 2001). But this algorithm 
degenerates to the naive 0{N'^) if all of scales are taken into account. Gomplementary to the 
tree-based algorithm, Szapudi et al. (2005) proposed a grid-based algorithm using the FFT 
technique. In computation, it is scaled as 0{Ng\ogNg) on all the scales though it would be 
ideal for large scales, here Ng is the number of grids. 

The cosmic statistics higher than the second order characterizes the Non-Gaussian fea¬ 
tures developed in the highly non-linear gravitational clustering processes and provide a 
powerful probe to gravitational collapse and merging of massive halos on small scales. For 
instance, they could be applied to discriminate various clustering models beyond the second 
order. And also, the encoded information of biasing effects in the hierarchical clustering 
scenario could be extracted from the higher order statistics to place an extra constraints 
on the models. Recently, the rapid increasing quantity and quality of both the 2dFGRS 
and SDSS have stimulate great efforts in measuring high order statistics with signihcantly 
improved accuracy. For instance, Jing & Borner (2004) measured the three-point correlation 
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function in an early release 2dF100K sample, Wang et al (2004) devoted to the same subject 
to test the conditional luminosity function model. Other approaches included estimating the 
biasing factor from the bispectrum (Verde et al, 2002) and moments of counts-in-cells in the 
2dFGRS(Croton et al, 2004a,b). 

However, the higher order statistics relies on large number of parameters in the con¬ 
figuration space, which complicates both analytical studies and practical measurements. 
Among those, the counts-in-cells (CIC) and relevant conditional cumulants (Szapudi 2004, 
2005) provide a relative simple non-Gaussian indicator. Especially in the latter, it could be 
understood as degenerate N-point correlation functions, or integrals of the monopole mo¬ 
ment of the bispectrum . While for the conventional CIC measurement using cubic cells, to 
achieve reliable statistical signihcance, it required a large number of random sampling and 
counting. To speed up measurements, several fast methods which are optimal for different 
purposes have been developed. In addition, an alternative approach to cosmic statistics is 
using discrete wavelet transformation (DWT) for cosmic fields, which enable us to probe the 
clustering features both in the position and wavenumber spaces simultaneously. The DWT 
method offers some sensitive statistical detectors to reveal the non-Gaussian features of helds 
(Feng, Deng & Fang, 2000, Feng & Fang 2000, Feng, Pando & Fang 2001). For a compre¬ 
hensive review of the high order cosmological perturbation theory, we refer to Bernardeau 
et al.(2002) and on the high order statistical methods, refer to Szapudi (2005) . 

This paper is to describe a new fast algorithm based on newly developed technique 
in numerical mathematics (Beylkin & Cramer, 2002). Actually, the algorithm is realized 
making using of a fast summation rule within the framework of multi-resolution analysis 
(MRA) (Beylkin & Cramer, 2002), and could be applied widely to point processes with the 
given hltered kernels. Furthermore, with the MRA-CS scheme, we proposed a new method 
for calculations of the second-order statistical measures such as the two-point correlation 
function and variances of density fluctuations. We also noticed the mostly recent paper by 
Thacker & Couchmann (2005), where a fast statistical measurement technique was proposed. 
Their approach is similar to the technique described in our paper, however, they adopted 
the 3rd order B-spline as an assignment function for mass partition. We here present a more 
general method based on the well-proven mathematics. 

The paper is organized as follow, in §2, we give a general formulation of the new fast 
algorithm. In particular, a new simple algorithm for measuring the second order statistics 
is described in §2.3. §3 presents numerical tests of the MRA-CS algorithm using N-body 
simulations. Finally, we summary the paper and give concluding remarks. In addition, as 
we make use of B-spline biorthogonal bases in implementation of the algorithm, some useful 
properties of B-splines are briefed in Appendix A. 
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2. General Formulae 
2.1. Filtered Density Fields 

The 3-dimensional distribution of galaxies can be modeled by a spatial point process 
with density n(x) , which is formally written as a sum of Dirac delta function 5f)(x), 

N 

= 5^Wi5i,(x-Xi) (1) 

i=0 

where N is the total number of samplinggalaxies, Xj is the position of the ith particle and Wi 
is its weight. For a flux limited galaxy redshift survey, Wi is given by the luminosity selection 
function. For most of measurable statistical quantities in the sample, we need to evaluate a 
discrete sum of 

N 

nw{y^) = ^^WiW{yi-yii) ( 2 ) 

i=l 

where W (x) is a kernel. Alternatively, the above summation can be written in terms of the 
original density distribution n(x) convolved with the kernel W (x) 

nw(x) = j lF(x — x')n(x')(i^x' (3) 

In the wavenumber space, the Fourier image reads 

hvr(k) = fF(k)h(k) (4) 

Here we list some kernels frequently used in statistical analyses of the cosmic density 
helds. 

(1) Spherical top hat: 

hFspfcere(^, R) = j~ 

where d{r) is the Heaviside step function, R is the hlter radius. The Fourier counterpart is 

Wsphere{k, R) = -^^{sin{kR) - kRcos{kR)) (6) 

Unlike a sharp cutoff at the radius R in real space, top hat hlter in k-space is rather diffuse. 
This hlter smooths over a hnite spherical volume of radius R around a point. Actually, it 
gives a volume-averaged number density of particles, and thus counts the total number of 
particles in a given spherical cell. 
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(2) Spherical shell top hat 




( 7 ) 


which dehnes the average surface density in the sphere shell at distance R from a given point. 
Its Fourier transformation gives 

sin(A;r) 


Wsheii{.k,r) = 


kr 


( 8 ) 


Obviously, it is related to the spherical top hat both in the real and wavenumber space by 

(9) 


Wshell{-,R) — -^^-^{R^Wsphere{--,R)) 


dR 

For the statistical estimator of the two-point correlation function, it is a basic quantity to 
be measured. 


(3) Cubic top hat: 


L) 


LxLyLz 


e{Lx/2 - \x\)e{Ly/2 - \y\)e{Lj2 - |;^|) 


and in the /c-space 


L) 


sm.{kxLx) sm.{kyLy) sm.{kzLz 

h T, 

l^y-Uy 


( 10 ) 


( 11 ) 


kxLx kyLy kzLz 
where the vector L denoting for {Lx, Ly, Lz} dehnes the side length of cubic. Similar to the 
spherical top hat, the cubic one gives a volume-averaged density in a given cubic. 


(4) Cylinder top hat: 


Wcyiinderip, z, R, k) = - p)9{h/2 - |^|) (12) 

where h is the height along axis of cylinder, R is the radius of circular cross section. In the 
/c-space, we have 

^ siiif/c /i/2l 

Wcyiinder{k±,kz,R,h) = —— J Jo{k±RVx)dx (13) 

where k± = ^/k^ + ky 

(5) Gaussian Filter: 

The Gaussian hlter behaves in the same way both in the coordinate space and in the 
wavenumber space, 

1 

WG(r,R) = — , —exp(-;^) (14) 

Wait, H) = 


( 15 ) 
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2.2. The Beylkin-Cramer Fast Summation Rule in Multiresolution Analysis 

Central to multiresolution analysis (MRA) (Daubechies, 1992; Fang & Thews, 1998) is 
to express an arbitrary function at various levels of the spatial resolutions, which forms a 
sequence of functional spaces Oc-"CF_iCVoC-"C T^(R) . Suppose a set of functions 
{(t){x — A;)|A; G Z} forms an orthonormal basis for Vq, dilated by a scale 2^ and translated by 
2~^k yields an orthogonal basis for Vj, 

{(pj,k{x) = -k) \keZ} (16) 

where 0 is called the basic scaling function. Projection of a function / G T^(R) onto Vj is 
approximation to / at the scale j, and converges to / as increasing j ^ oo. Without loss of 
generality, all of formulae hereafter are written in one-dimensional forms. The generalization 
to multi-dimensional space is straightforward. 

In terms of scaling functions, we may make a decomposition of the density distribution 
WiS{x — Xi) in the MRA at a scale j. 

= (17) 

i 

in which the scaling function coefficients (SFCs) {s;(} are given by the inner product 

sj = J n{x)(j)jj{x)dx (18) 

Np 

= (19) 

i=\ 

which describes the density fluctuation hltered on the scale of 1/2-^ at position 1/2^ (Fang & 
Thews, 1998, Feng & Fang, 2004). Usually, the scaling functions is chosen to have a compact 
support. In this case, the summation of cycling through the whole sample reduces to sum 
up the contributions from neighbouring particles around the position 1. 

Similarly, for a kernel W{x, y), projection onto Vj yields a multiresolution representation 
W — Wj, which has the form (Beylkin & Cramer, 2002), 

WjA, 1 /) = wl^<Pj,iA)<Pj,m{y) ( 20 ) 

l,m 


W 


j 

l,m 


I 


where 


\V(x, y)<t)xi(x)<l>j,m(v)dxdy 


( 21 ) 
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Substituting eq.(17) and eq.(20) into eq.(3), we obtain the filtered density field at the scale 
j as 

nw{x) ^ nly(x) = (22) 

i 

with 

=Y1 (23) 

m 

For a homogenous kernel W, = wj_^ is a Toeplitz matrix, and conventionally its 

operation on the vector s-^ = in eq.(23) could be accomplished using the fast Fourier 
transformation technique (FFT). 

As described above, we have a fast algorithm based on the multiresolution analysis, 
which will be hereafter called MRA-CS (Multi-Resolution Analysis for Cosmic Statistics). 
Practically, once the orthonormal basis functions are specihed, the algorithm can be imple¬ 
mented in the following four steps: 

(1) computing the SFCs of the density held by the summation eq.(18); 

(2) at a given scale j, computing the coefficients that represent the kernel W in 
eq.(21), and then making discrete Fourier transformation of to get the Green function 

in the wavenumber space; 

(3) performing multiplications of the Toeplitz matrix wj_^ to the vector via the 
FFT; 

(4) at any points, values of the hltered density held nw{x) can be read out simply by 
the summation eq.(22). 


2.3. The Fast Algorithm for Two-Point Correlation Function 

The two-point spatial correlation function ,^(r) =< (5(x)(5(x -|- r) > of a density held 
describes the lowest order statistical measure of departure of the matter distribution from 
homogeneity. It is Fourier conjugate of the power spectrum P{k) =< |5(k)p >, 

f - 

ar) = Wsheiiik, r)P{k)kHk (24) 

In practical ststistical analysis, there have been several edge-corrected estimators of the two- 
point correlation function proposed so far. The widely used one is that given by Tandy and 
Szalay, which takes the form 

- , , DD — 2DR RR 

^Ls{r) = 


RR 


( 25 ) 



where DD is the count of pairs in the catalog within the interval [r — 0.5dr, r + 0.5dr], RR is 
the number of pairs in a random sample, and DR the number of pairs between the catalog 
and the binomial random sample in the same interval. 

Based on the MRA-CS technique, we describe here a fast and simple algorithm to 
estimate the number of pairs. Firstly, we consider counting the DD-pair in the catalog. 
According to eq.(22), the pair count in the shell of (r, r + dr) from a particle i is approximated 
by 

nivixi) = dV'^sj4>j^i{xi) (26) 

i 

where dV = Aur'^dr is the differential volume element of spherical shell if dr is small enough. 
Since dV is a common factor for the pair counts DD, DR and RR, it will be dropped 
hereafter. Taking the sum over the whole sample, the total number of pairs is 

N 

DD = (27) 

i=l I 

which can be written in an equivalent form, 

DD = ^^s\ / dx4>j^i{x)n{x) (28) 

i 

where n{x) is the number density given by eq.(l). Using the decomposition eq.(17) and 
orthogonality of the base functions we have 

DD = Y^ sjsj = ■ s^' (29) 

i 

The above equation means that counting the number of pairs can be simplihed to a vector 
product of = {sj} and s-^ = The similar algorithm can be applied for the DR and 

RR pairs. For instance, in counting the DR pairs, we may still use eq.(29), but {s/} are 
given instead by the decomposition coefficients calculated for the given random sample. 

We emphasis here that, in eqs.(25)-(28), sj is obtained from a convolution of sj with 
the sphereical shell top hat Wsheu{k,r). Actually, this algorithm allows us to calculate the 
correlation function without counting particles in a series of spherical shells around each 
particle. Namely, it omits binning in the radial direction of spheres, and thus is bin width 
independent. 

On the other hand, if we use the sphereical top hat Wgpherei fhe pair-counting is per¬ 
formed in the sphere of radius r centered on each particle. In this case, the correlation 
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function estimated from eq.(25) is just what we refer as the integral or volume-averaged 
two-point correlation function, which is related to .^(r) by 

= 4 / ^{x)x‘^dx (30) 

Jo 

Analogue to eq.(24), we have 

Wsphere{k,r)P{k)k‘^dk (31) 

where Wsphere{k,r) is the spherical top hat window function given by (eq.(6)). 

For the second order variance of hltered density fluctuations 

A-) =< ^ ■)rP(A^)rf=k (32) 

where the dot ■ denotes for a set of parameters specifying the spatial shape of the window 
function. Similar to the above derivation, we may show that it can be worked out simply by 

^^(■) = = |s^r - 1 (33) 

i 

Obviously, in the spirit of this method, it is possible to generalize the MRA-CS algo¬ 
rithm for higher order statistics. In that case, we need to deal with integrals like = 
I 4>j,i4>j,m4>j,ndx, which are currently referred as the connection coefficients. For the detail 
discussions, we will give in the forthcoming paper. 


3. Numerical Tests 

In the last section, we have described a fast algorithm MRA-CS for hltered cosmic 
helds. The direct application is the counts-in-cells with arbitrary geometries and the second 
order statistical measures such as the two-point correlation funcction and the variance of 
hltered density helds. To justify to what extent this numerical scheme works well, we make 
numerical tests using N-body simulation samples. We analyzed the z = 0 output of N-body 
simulations by the Virgo Consortium (Kauhman et al. 1999) in a LCDM model specihed by 
the cosmological parameters Qm = 0.3, Ha = 0.7, F = 0.21, h = 0.7, ag = 0.85 and force soft 
length 20h“^kpc. The side length of the simulation box was 239.5h~^Mpc, and the number 
of CDM particles was 256^. 

Analytical form of the MRA-CS scheme presented in the last section is based on the 
multi-resolution analysis using compactly supported, orthonormal bases involving a single 
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scaling function. In practical applications, we can alternatively adopt a biorthogonal system 
including a pair of dual scaling functions. In this paper, we choose the central B-splines in 
the implementation. As the B-splines do not form an orthogonal basis, they may be parts 
of a biorthogonal system. The corresponding dual scaling function could be constructed as 
that spanned by the B-splines, and is not compactly supported. In this case, some changes 
need to be made in the algorithm. For this purpose, we present some useful properties and 
formulae of the B-splines in Appendix A. 

The naive measurement of counts in cells can be made by direct counting. To test the 
MRA-CS algorithm, a direct method is to compare results obtained by the MRA-CS with 
the exact counting. In what follow, we perform numerical tests for the spherical, cubic and 
cyclinder counts in cells, respectively. In the calculations, we applied the 5th order B-splines. 
Meanwhile, alternative to the method described in Appendix.2, according to eqn.(AlO), we 
adopted the Green function simply by W{2t[^) ■ lyp = W/a{^)Y tabulating on a 
discrete grid, where /3(.^) and a{^) are given by eqs.(A5) and (A8) respectively. 

In the hrst numerical experiment, we randomly placed a sphere center in a poisson 
sample with 256^ particles in a 256^ grid, and then measure the numbers of particles within 
a sequence of the concentric spherical shells at different radius. Figure 1 plots the volume- 
average density vs. radius for three randomly placed concentric sphere shells. Clearly, the 
results calculated from our fast algorithm shows an excellent agreement with the exact values. 

We perform further numerical experiment to test the counts-in-cells with different ge¬ 
ometries using the Virgo simulation sample in a 512^ grid. We have done two sets of the 
counts-in-cells tests. One was made for hxing the center of cells in a selected particle within 
a massive halo, and another is for a low density region. The results for the counts in cubic 
cells are demonstrated in hgure 2, where each panel is for cubes with a specihc value of the 
aspect ratio. The horizonal axis is denoted by the equivalent radius of spheres with the same 
volumes as the cubic cells. Obviously, except for those at scales of a few grids, the MRA-CS 
algorithm matches with the exact counting very well. Similar results are displayed in hgure 
3 for the spherical and cylinder counts-in-cells. 

It is worthwhile to note that, at small scales, there are visible differences between the 
MRA-CS and exact counting. Actually, in the MRA-CS scheme, the Fourier transformation 
of the B-spline of order n is a low pass hlter with a sharp suppression of power. Con¬ 

sequently, the Poisson shot noise and huctuations at short wavelengths will be suppressed. 
This kind of behavior can be viewed clearly in both hgure 2 and hgure 3. In case for the 
counts-in-cells within the massive halo (the upper curve in each panel), as the ehect of 
shot-noise is not signihcant at small scales, such a suppression leads to the volume-average 
densities calculated from the MRA-CS scheme being systematically lower than the direct 
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counting as indicated in figure 2. While in the less dense region(the lower curve in each 
panel), the shot noise dominates at small scales, there are significant fiuctuations in the 
volume-averaged density obtained by direct counting. In comparison, the MRA-CS gives 
slightly smooth variations of the density with radius. 

To have a visual inspection of accuracy of the MRA-CS method, we measure the counts- 
in-cells using 256^ spherical cells randomly thrown in the simulation sample, and make a 
scatter plot in figure 4 to compare the volume-averaged density calculated from the MRA-CS 
versus those by the direct counting. In order for a good view of errors, the volume-averaged 
densities are plotted in logarithmic scales. The upper and lower panel display the results for 
the cell sizes of R = 5h“^Mpc and R = 10h“^Mpc, respectively. Figure 4 shows more clearly 
the effects of shot noise occurred in both cases. In the case R = 5h“^Mpc, the larger scatters 
appear in low density regions where sparse sampling makes shot noise more significant than 
in dense regions. Increasing the size of cell will lead to, on average, more sampling particles 
in cells, and consequently, the overall scatters tend to be squashed rather smaller. In other 
words, for a given density, large cells contain more particles than those in small cells, and 
the effect of Poisson noise in the large cells should be less significant than in the small cells 
accordingly. This kind of phenomena has been actually illustrated in figure 4 by comparing 
the case of R = 5h“^Mpc (upper panel) with R = 10h“^Mpc (lower panel). 

Moreover, in figure 5, we display the probability distribution functions (PDF) of the 
counts in cells for R = 5h“^Mpc and R = 10h“^Mpc with three different shapes. In calcula¬ 
tions, we used 180 bins to make the PDFs. Basically, there is an excellent agreement of the 
MRA-CS calculation and the direct counting. In the case R = 5h“^Mpc, the PDFs compiled 
from the direct counting shows slight fiuctuations in the low density regions pv < 0.5 due 
to the effect of shot noise. In comparison, such fiuctuations have been smoothed out in the 
MRA-CS calculations. Whereas in case of R = 10h“^Mpc, the direct counting and MRA-CS 
shows a very good agreement as shown in figure 5. It indicates again that the shot noise 
could be reduced with increasing cell sizes. This demonstration does show how a reliable 
statistics could be worked out through the MRA-CS technique. 

In further applications of the MRA-CS scheme, we measured the second order statistics 
in the Virgo sample using the fast algorithm described in §2.3. A 512^ grid has been used, 
and the physical grid size is thus 0.47h“^Mpc. The results include the spatial two-point 
correlation function in figure 6, the volume-averaged two-point correlation function in figure 
7, the variance of density fiuctuations by the top-hat filter in figure 8 and the Gaussian filter 
in figure 9, respectively. In all figures, we also plot the theoretical curves using the fitting 
formulae of nonlinear power spectrum in LCDM models (Smith et al, 2003). The agreement 
of our fast algorithm with the theoretical model looks quite perfect in scales down to the 
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grid size. 

The current implementation of the MRA-CS algorithm is written in Fortran-90. Com¬ 
piling on a SGI Altix350 workstation of 16 Intel® Itanium 2 1.5G CPUs with the OPENMP 
parallelization, it runs at the speed of 1.6 seconds per 256^ counts-in-cells using B-splines of 
degree n = 5. By the scaling, a massive counts-in-cells measurement with 10® sampling cells 
could be done with 2 minutes in the same machine. In the computations, the Intel® Math 
Kernel Library (MKL) has been also applied to perform the fast Fourier transformations. 

Finally, it is necessary to point out that, the central idea in the MRA-CS scheme is to 
End a finite and smooth representation of discrete particle distribution, which is a singular 
function as given by the Dirac delta function in eqn.(l). In principle, the higher is order 
of B-splines, the better is approximation to the Dirac function (see Appendix 3). However, 
as computational cost is scaled as (n -|- 1)'^ where n is the order of B-splines and d is the 
dimension of space, the higher order B-splines may leads to less efficient in the algorithm. To 
set up a compromise between them, we have numerically tested the dependence of accuracy 
on orders of B-splines. The result indicates that for n > 5, there are no visual differences 
at scales larger than the grid size. Accordingly, we have used the fifth order of B-splines 
throughout this paper. 


4. Discussions and Conclnding Remarks 

We have described a fast algorithm of cosmic statistics based on multiresolution analysis, 
in which fields or operators are decomposed in term of a set of orthonormal compactly 
supported bases at various levels of details. This method naturally leads to a fast summation 
rule, and could find wide applications in different areas. In application to cosmic statistics, we 
made an implementation of the MRA-CS scheme using high order B-spline biorthogonal bases 
and justified its efficiency and reliability by the numerical experiments. Using spherical,cubic 
and cylinder counts in cells, we found that the MRA technique yields good estimations of 
the counts-in-cells. On small scales, there appear some discrepancies in comparison with 
direct counting. In statistics, it is not negative. Actually, weak signals in a point process 
suffer from the Poisson shot noise. However, in the MRA-CS scheme, those high frequency 
shot noises could be suppressed by the virtue of the fast decaying behavior of sharp low pass 
Liters. 

Based on the MRA-CS scheme, we proposed further a fast algorithm to estimate the two- 
point spatial correlation function. Strikingly, we proved that, taking average of pair counts 
over a sample can be simplihed to a vector product. That is, the averaged pair counts 



can be obtained without counting. Is is also shown that this method can be generalized 
to calculate other second order statistical quantities such as the variances of cosmic helds 
hltered with arbitrary window functions. Using the simulation sample, we demonstrate an 
excellent agreement between our MRA-CS method and the theoretical expectation. 

The MRA-CS algorithm has several advantages, 

(1) It is a grid-based algorithm making use of the FFT technique, and thus is signihcantly 
faster than the conventional counts in cells methods. 

(2) In particular, the algorithm does not slow down while extending to large scales. 
Actually, the number of operations required using this method is independent of cell sizes. 
Therefore, this method enable us to perform a massive sampling to reduce statistical errors. 

(3) The algorithm can be, in principle, applied to sampling cells with arbitrary shapes, 
so as their spatial conhgurations could be well dehned mathematically. 

When measuring the counts-in-cells in real galaxy samples, there are two observational 
effects needed to be taken into account. The first is volume incompleteness due to bright star 
masks and complicated survey geometry. In the former, holes around bright stars should be 
excavated, and in the latter some sampling cells beyond boundary contain some fraction of 
space that are not in the survey volume. The another effect is spectroscopic incompleteness. 
There are several schemes suggested so far to correct galaxy counts in cells for real samples 
( e.g. Efstathiou et ah 1990, Croton et al. 2004). For instance, in the Croton et al.’s 
approach applying to the 2dFGRS catalogue, they first estimated the combined spectroscopic 
and volume completeness factor / using the survey mask and then compensate the missed 
volume by expanding the cell such that the resized incomplete cell has the same volume as 
the ideal fully complete cell. Subjecting to the numerical tests, this method is likely to give 
reasonable results. 

This paper makes no attempt to discuss the issue of incompleteness in detail. Here, we 
only brief a simple scheme to tackle this problem. Practically, we may generate a uniform 
particle distribution with the survey mask applied, and then compute its SFCs. When 
estimating galaxy count within a cell in the real data using the MRA-CS method, the same 
measurement is also performed in the marked uniform sample. Actually, measuring the 
counts-in-cells in the marked uniform sample may yield a good estimation of the completeness 
factor / so as it is dense enough. Explicitly, the completeness factor / for a cell is equal to 
its volume-averaged density if the mean density of the unmarked uniform sample is taken 
to be unity. It is basically an alternative to the current Monto-Carlo method for computing 
irregular spatial volumes. In the proceeding step, we may £x an acceptable minimum value 
of the completeness factor fc, sampling cells with f < fc are excluded, otherwise, we may 
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apply Croton et al’s volume compensation algorithm for galaxy counts in the re-scaled cells. 

Statistically, the more information could be extracted by sampling in larger conhgura- 
tion spaces of counts-in-cells. In fact, the non-Gaussianities characterized by the high order 
statistics in the cosmic fields are produced by the highly non-linear mode-mode coupling, 
which may manifest itself in gravitational collapse or merging of dark matter halos. There¬ 
fore, examining the shape dependence of high order statistics might be helpful to understand 
gravitational clustering pattern, e.g., the topology and large scale structure, morphologies 
of voids, hlaments and their networks, and the origin of biasing. It is expected that the 
MRA-CS technique developed in this paper could provide an efficient tool for investigating 
the higher order statistical features in the large scale structure of the universe. 

The MRA-CS scheme has been implemented in a set of subroutines, which is publicly 
available upon request to the author. 
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A. B-splines 

A.l. Definition and Basic Properties 

B-splines are symmetrical, bell shaped functions constructed from the n+1 fold convo¬ 
lution of a rectangular pulse 


1 

if\x\ < 0.5 


0.5 

if\x\ = 0.5 

(Al) 

0 

otherwise 


Pfl 0 ■ 

■ -Pfn 0 /5+(i+i))(a:) 

(A 2 ) 


where the symbol o denotes for operation of convolution. (x) is a piecewise polynomial 
of degree n, 

I 


n+1 


+">(x) = ^(-l)' 


1=0 


n\ 


(A3) 


In practical calculation, the B-splines could be obtained using recursion over the spline order, 

^ (" + l)/2 +V ‘,(x + 1/2) + _ 

n n 

The Fourier transformation of is related to the n+1 fold convolution construction 

of the B-splines, 

y »>(0 = ( A 5 ) 

The n-th order B-spline has a compact support of [—^^, As f3 is not orthogonal 
to its translates, we apply the orthgonalizatio scheme to construct its dual base 7 , such that 

/ OO 

{x - {x -l)dx = 5ki (A 6 ) 

■OO 

It follows from above orthogonality condition that, in the wavenumber space. 




= 


iW(0 


with the periodic function 

OO n 

aW(^)= ^ 


(A7) 


(AS) 


l=—oo 


l=—n 


It is noted that the dual scaling function 7 is not compactly supported. 
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A.2. Kernel Representation Using B-spline Biorthogonal Bases 


For a given kernel W{x,y), it could be decomposed in terms of B-splines (3 (Beylkin & 
Cramer, 2002), 

^{x,y) = ^'Y^Wk-il3{x-k)l3{y-l) (A9) 

k I 

it follows from the orthogonality eq.(A 6 ) that the coefficients {wn} can be found formally 
by 

Wn = J 'jix — n){W o'y)(x)dx (AlO) 

Since 7 can be expressed as a linear combination of B-splines, we have 


= ^qk(3{x-k) 


Substituting eq.(All) into eq.(AlO), we have 

—mCn 


Wn = 


in which. 


and 


Qm — ^ ^ Qk^m+k 
k 

Cii — / — n){W o 0)(x)dx 


(All) 


(A12) 


(A13) 

(AM) 


Taking Fourier transformation on both sides of eq.(All), we may see that the coefficients 
{(/„} are Fourier coefficients of the function l/a(0, and the coefficients {§„} in eq.(A13) are 
Fourier coefficients of the function l/[a(f)]^- Define the trigonometric series w and C by 


— 00 

and 

00 

c(o = E 

— 00 

Thus, Fourier transformation of eq.(A12) implies 



(A15) 


(A16) 


(A17) 
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Obviously, eqs.(Al5)(Al6) and (A17) formulate an algorithm of obtaining Wn directly from 
Cn. 

Let the central B-spline f3 is of odd degree M—1, then support of I3{x) is {x : |a;| < M/2}. 
Hence, eq(A14) can be written in form of 

/ M 

B{x)W{x + n)dx (A18) 

-M 

where B{x) = J (3(x + y)P{y)dy is the autocorrelation of P{x), which is supported on the 
interval {x : |a;| < M}. If the kernel W{x,y) does not have singularity in whole axis 
R, just as either spherical or cubic kernel considered in this paper, the integral eq.(AlO) 
is convergent, and only hnite number of the coefficients Cn are nonzero. For the singular 
operator such as that of the power law l/a;“, the regularization procedure is necessary, the 
integral in eq.(AlO) may fail to converge for all integer n, because 'y{x — n) fails to vanish 
in a neighborhood of the singularity of the kernel W. The detail for regularization scheme 
based on the multiresolution approach could be found in Beylkin and Cramer (2002). 


A.3. Approximation to the (5-Punction 


Let n{x) = 5{x — Xq) in eq.(17), and the scaling function is taken to be B-splines, 
4>j^i{x) = using the dual of P, then we have the hnite representation of ^-function 

from eq.(18) 

5{x- xo) nl{x) = {x) (A19) 

fcez 

It could be proven that, if tc is a test function, w G , then 


w{xq) 


nl{x)w{x)dx + ) 


(A20) 


where h = 2 b j > 0 (Beylkin, 1995). 
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Fig. 1.— Spherical counts in cells: volume-averaged density varies with radius measured in 
a sequence of concentric spherical shells placed in a random sample with 256^ particles. The 
MRA-CS results(circules) are compared with the direct counting (solid line). The upper, 
central and lower panels display the results in three numerical experiments respectively 
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Fig. 2.— The volume-averaged density measured in the Virgo simulation sample as a function 
of cell size for a hxed center of cells. Each panel corresponds to a given aspect ratio as 
indicated in the hgure, and plots two sets of the counts in cells measurements. One is 
made in a massive dark matter halo (upper curves), and another is for a low density region 
(lower curves). The hgure compares the results obtained from the MRA-CS algorithm (open 
circules) with the direct counting (solid line). 





Fig. 3.— Same as fig. (2), but for spherical and 
cells, the aspect ratio is indicated in each panel. 



cylinder counts in cells. For the cylinder 











10'' 10° io' 

( Direct Counting) 

Fig. 4.— Spherical counts in cells: the volume-averaged density calculated by the MRA-CS 
scheme versus those from the direct counting. The measurements were made in the Virgo 
simulation sample. The 3 x 10^ scatters (circule) randomly chosen from a total 256^ sampling 
are displayed. The upper panel is for the cell size R = 5h“^Mpc and the lower panel for 
R = 10h“^Mpc. 
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0.1 1 


Pv 

Fig. 5.— Comparison of the probability distribution function of volume-averaged density 
obtained from the MRA-CS algorithm (solid line) with that by direct counting (circule). The 
counts in cells are performed in the same simulation sample as in hg.2. The upper, middle 
and lower panels display the results for spherical, cubic and cylinder cells. In each panel, the 
right and left curves denote results for cells with the equivalent radius R = 5h“^Mpc and 
R = 10h“^Mpc respectively. 
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Fig. 6.— The two-point correlations function measured with the MRA-CS alogrithm (open 
circules) in the Virgo LCDM sample vs. the nonlinear power spectrum given by Smith et.al 
(2003)(solid line). In the MRA-CS calculation, a 512^ grid has been used. 










